Goto

Collaborating Authors

 phase transition



Clustering in Deep Stochastic Transformers

Fedorov, Lev, Sander, Michaël E., Elie, Romuald, Marion, Pierre, Laurière, Mathieu

arXiv.org Machine Learning

Transformers have revolutionized deep learning across various domains but understanding the precise token dynamics remains a theoretical challenge. Existing theories of deep Transformers with layer normalization typically predict that tokens cluster to a single point; however, these results rely on deterministic weight assumptions, which fail to capture the standard initialization scheme in Transformers. In this work, we show that accounting for the intrinsic stochasticity of random initialization alters this picture. More precisely, we analyze deep Transformers where noise arises from the random initialization of value matrices. Under diffusion scaling and token-wise RMS normalization, we prove that, as the number of Transformer layers goes to infinity, the discrete token dynamics converge to an interacting-particle system on the sphere where tokens are driven by a \emph{common} matrix-valued Brownian noise. In this limit, we show that initialization noise prevents the collapse to a single cluster predicted by deterministic models. For two tokens, we prove a phase transition governed by the interaction strength and the token dimension: unlike deterministic attention flows, antipodal configurations become attracting with positive probability. Numerical experiments confirm the predicted transition, reveal that antipodal formations persist for more than two tokens, and demonstrate that suppressing the intrinsic noise degrades accuracy.


Stability and Accuracy Trade-offs in Statistical Estimation

Chakraborty, Abhinav, Luo, Yuetian, Barber, Rina Foygel

arXiv.org Machine Learning

Algorithmic stability is a central concept in statistics and learning theory that measures how sensitive an algorithm's output is to small changes in the training data. Stability plays a crucial role in understanding generalization, robustness, and replicability, and a variety of stability notions have been proposed in different learning settings. However, while stability entails desirable properties, it is typically not sufficient on its own for statistical learning -- and indeed, it may be at odds with accuracy, since an algorithm that always outputs a constant function is perfectly stable but statistically meaningless. Thus, it is essential to understand the potential statistical cost of stability. In this work, we address this question by adopting a statistical decision-theoretic perspective, treating stability as a constraint in estimation. Focusing on two representative notions-worst-case stability and average-case stability-we first establish general lower bounds on the achievable estimation accuracy under each type of stability constraint. We then develop optimal stable estimators for four canonical estimation problems, including several mean estimation and regression settings. Together, these results characterize the optimal trade-offs between stability and accuracy across these tasks. Our findings formalize the intuition that average-case stability imposes a qualitatively weaker restriction than worst-case stability, and they further reveal that the gap between these two can vary substantially across different estimation problems.


Cascade of phase transitions in the training of energy-based models

Neural Information Processing Systems

In this paper, we investigate the feature encoding process in a prototypical energy-based generative model, the Restricted Boltzmann Machine (RBM). We start with an analytical investigation using simplified architectures and data structures, and end with numerical analysis of real trainings on real datasets. Our study tracks the evolution of the model's weight matrix through its singular value decomposition, revealing a series of thermodynamic phase transitions that shape the principal learning modes of the empirical probability distribution. We first describe this process analytically in several controlled setups that allow us to fully monitor the training dynamics until convergence.


A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent

Neural Information Processing Systems

This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$, their dimension $p$, and the dimension of feature space $N$ are all large and comparable. In this regime, the random RFF Gram matrix no longer converges to the well-known limiting Gaussian kernel matrix (as it does when $N \to \infty$ alone), but it still has a tractable behavior that is captured by our analysis. This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$. Based on these estimates, a precise characterization of two qualitatively different phases of learning, including the phase transition between them, is provided; and the corresponding double descent test error curve is derived from this phase transition behavior. These results do not depend on strong assumptions on the data distribution, and they perfectly match empirical results on real-world data sets.


Emergent Collective Memory in Decentralized Multi-Agent AI Systems

Khushiyant, null

arXiv.org Artificial Intelligence

We demonstrate how collective memory emerges in decentralized multi-agent systems through the interplay between individual agent memory and environmental trace communication. Our agents maintain internal memory states while depositing persistent environmental traces, creating a spatially distributed collective memory without centralized control. Comprehensive validation across five environmental conditions (20x20 to 50x50 grids, 5-20 agents, 50 runs per configuration) reveals a critical asymmetry: individual memory alone provides 68.7% performance improvement over no-memory baselines (1563.87 vs 927.23, p < 0.001), while environmental traces without memory fail completely. This demonstrates that memory functions independently but traces require cognitive infrastructure for interpretation. Systematic density-sweep experiments (rho in [0.049, 0.300], up to 625 agents) validate our theoretical phase transition prediction. On realistic large grids (30x30, 50x50), stigmergic coordination dominates above rho ~ 0.20, with traces outperforming memory by 36-41% on composite metrics despite lower food efficiency. The experimental crossover confirms the predicted critical density rho_c = 0.230 within 13% error.


Empirical Hardness in Multi-Agent Pathfinding: Research Challenges and Opportunities

Ren, Jingyao, Ewing, Eric, Kumar, T. K. Satish, Koenig, Sven, Ayanian, Nora

arXiv.org Artificial Intelligence

Multi-agent pathfinding (MAPF) is the problem of finding collision-free paths for a team of agents on a map. Although MAPF is NP-hard, the hardness of solving individual instances varies significantly, revealing a gap between theoretical complexity and actual hardness. This paper outlines three key research challenges in MAPF empirical hardness to understand such phenomena. The first challenge, known as algorithm selection, is determining the best-performing algorithms for a given instance. The second challenge is understanding the key instance features that affect MAPF empirical hardness, such as structural properties like phase transition and backbone/backdoor. The third challenge is how to leverage our knowledge of MAPF empirical hardness to effectively generate hard MAPF instances or diverse benchmark datasets. This work establishes a foundation for future empirical hardness research and encourages deeper investigation into these promising and underexplored areas.


Data-Driven Learnability Transition of Measurement-Induced Entanglement

Qian, Dongheng, Wang, Jing

arXiv.org Artificial Intelligence

Measurement-induced entanglement (MIE) captures how local measurements generate long-range quantum correlations and drive dynamical phase transitions in many-body systems. Yet estimating MIE experimentally remains challenging: direct evaluation requires extensive post-selection over measurement outcomes, raising the question of whether MIE is accessible with only polynomial resources. We address this challenge by reframing MIE detection as a data-driven learning problem that assumes no prior knowledge of state preparation. Using measurement records alone, we train a neural network in a self-supervised manner to predict the uncertainty metric for MIE--the gap between upper and lower bounds of the average post-measurement bipartite entanglement. Applied to random circuits with one-dimensional all-to-all connectivity and two-dimensional nearest-neighbor coupling, our method reveals a learnability transition with increasing circuit depth: below a threshold, the uncertainty is small and decreases with polynomial measurement data and model parameters, while above it the uncertainty remains large despite increasing resources. We further verify this transition experimentally on current noisy quantum devices, demonstrating its robustness to realistic noise. These results highlight the power of data-driven approaches for learning MIE and delineate the practical limits of its classical learnability.


Memory-Amortized Inference: A Topological Unification of Search, Closure, and Structure

Li, Xin

arXiv.org Artificial Intelligence

Contemporary ML separates the static structure of parameters from the dynamic flow of inference, yielding systems that lack the sample efficiency and thermodynamic frugality of biological cognition. In this theoretical work, we propose \textbf{Memory-Amortized Inference (MAI)}, a formal framework rooted in algebraic topology that unifies learning and memory as phase transitions of a single geometric substrate. Central to our theory is the \textbf{Homological Parity Principle}, which posits a fundamental dichotomy: even-dimensional homology ($H_{even}$) physically instantiates stable \textbf{Content} (stable scaffolds or ``what''), while odd-dimensional homology ($H_{odd}$) instantiates dynamic \textbf{Context} (dynamic flows or ``where''). We derive the logical flow of MAI as a topological trinity transformation: \textbf{Search $\to$ Closure $\to$ Structure}. Specifically, we demonstrate that cognition operates by converting high-complexity recursive search (modeled by \textit{Savitch's Theorem} in NPSPACE) into low-complexity lookup (modeled by \textit{Dynamic Programming} in P) via the mechanism of \textbf{Topological Cycle Closure}. We further show that this consolidation process is governed by a topological generalization of the Wake-Sleep algorithm, functioning as a coordinate descent that alternates between optimizing the $H_{odd}$ flow (inference/wake) and condensing persistent cycles into the $H_{even}$ scaffold (learning/sleep). This framework offers a rigorous explanation for the emergence of fast-thinking (intuition) from slow-thinking (reasoning) and provides a blueprint for post-Turing architectures that compute via topological resonance.


Why is topology hard to learn?

Oriekhov, D. O., Bergkamp, Stan, Jin, Guliuxin, Luna, Juan Daniel Torres, Zouggari, Badr, van der Meer, Sibren, Yazidi, Naoual El, Greplova, Eliska

arXiv.org Artificial Intelligence

Phase classification has become a prototypical benchmark for data-driven analysis of condensed matter physics. The type and complexity of the phase transition dictate the level of complexity of the algorithm one has to employ. This topic has been broadly explored, offering a menu of both supervised and unsupervised techniques ranging from simple clustering [1-3] to more complex machine learning methods [4-7]. The phase classification problem is most commonly posed like so: we allow our model to view a dataset that is both relevant and straightforwardly obtainable in the scenario we wish to study. We introduce this data set to a model that has no prior knowledge of underlying physics.